Skip to content

perf: cache hardware detection and optimize warm-path reuse (2/4)#1252

Open
davidberenstein1957 wants to merge 5 commits into
masterfrom
davidberenstein1957/perf-2-hardware-cache
Open

perf: cache hardware detection and optimize warm-path reuse (2/4)#1252
davidberenstein1957 wants to merge 5 commits into
masterfrom
davidberenstein1957/perf-2-hardware-cache

Conversation

@davidberenstein1957

@davidberenstein1957 davidberenstein1957 commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Summary

Part 2/4 of the tracker performance stack. Depends on #1251.

cc @inimaz — this is the hardware-cache layer of the #1246 split.

Adds process-level hardware reuse and faster detection so repeat tracker runs in the same Python process skip repeated probing:

  • New hardware_cache.py with HardwareKind enum and per-process setup cache
  • Reuse CPU/GPU/RAM detection across tracker instances (get_or_run_setup)
  • Cache probe results via @lru_cache (Power Gadget, PowerMetrics, NVIDIA, ROCm)
  • Platform-aware CPU backend order (Mac ARM prefers fast cpu_load before PowerMetrics)
  • Parallel CPU/GPU setup with ThreadPoolExecutor
  • Global cpu_percent prime once per process for cpu_load mode
  • PowerMetrics sudo check timeout (3 s)
  • FAQ note on warm hardware reuse within one process

Benchmarks (measured locally, offline Mac ARM, 2026-06-21)

Cumulative with #1251; fresh subprocess for cold metrics.

Metric master #1251 only #1251 + this PR Δ vs master
Tracker __init__ p50 190 ms 1.9 ms 1.2 ms ~99% faster
Cold lifecycle p50 (init→start→stop) 1,685 ms 1,767 ms 408 ms ~76% faster (~4×)
Warm lifecycle best (same process) 1,565 ms 1,561 ms 4.8 ms ~325× faster
CLI monitor subprocess p50 850 ms 786 ms 694 ms ~18% faster

Stack

  1. perf: defer tracker initialization and slim import path (1/4) #1251 — lazy initialization & import slimming
  2. This PR — hardware detection cache & warm-path reuse
  3. perf: defer API run creation until first emission upload (3/4) #1253 — lazy API run creation
  4. perf: speed up CLI monitor startup and fix wrapped commands (4/4) #1254 — CLI monitor startup

CI / quality

Test plan

  • Reviewer: second tracker in same process should reuse cached hardware, not re-probe

Replaces

Split from #1246.

Lazy-load reference data, output handlers, hardware probing, and the
emissions engine so tracker construction stays fast when work is deferred.

Co-authored-by: Cursor <cursoragent@cursor.com>
@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.39171% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.27%. Comparing base (62d2bcd) to head (00817d8).

Files with missing lines Patch % Lines
codecarbon/core/resource_tracker.py 86.00% 7 Missing ⚠️
codecarbon/core/hardware_cache.py 97.63% 3 Missing ⚠️
Additional details and impacted files
@@                           Coverage Diff                            @@
##           davidberenstein1957/perf-1-lazy-init    #1252      +/-   ##
========================================================================
- Coverage                                 89.39%   89.27%   -0.12%     
========================================================================
  Files                                        47       48       +1     
  Lines                                      4565     4757     +192     
========================================================================
+ Hits                                       4081     4247     +166     
- Misses                                      484      510      +26     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@davidberenstein1957

Copy link
Copy Markdown
Collaborator Author

@inimaz#1246 has been split into this 4-PR stack for easier review. Benchmarks in the description are measured locally (Mac ARM, 2026-06-21); pre-commit and tests pass on each branch. Please start with #1251.

Deferred hardware setup in PR 1 requires _ensure_hardware_ready() in tests
that inspect tracker._hardware before start().

Co-authored-by: Cursor <cursoragent@cursor.com>
@davidberenstein1957 davidberenstein1957 force-pushed the davidberenstein1957/perf-2-hardware-cache branch from 1f759d9 to 9d1c656 Compare June 21, 2026 14:37
@davidberenstein1957

Copy link
Copy Markdown
Collaborator Author

Rebased on #1251 + conftest hardening to reset probe lru_cache state between tests.

Verified locally: 530 passed, pre-commit clean.

davidberenstein1957 and others added 3 commits June 22, 2026 08:37
Use TYPE_CHECKING import for pd so lazy loading stays fast while keeping
return type annotations on get_cloud_emissions_data and get_cpu_power_data.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add process-level hardware setup cache, probe result caching, platform-aware
CPU backend selection, and parallel CPU/GPU setup for faster repeat runs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Import GPU/CPU probe modules before clearing caches so lru_cache state
does not leak across tests in the full suite.

Co-authored-by: Cursor <cursoragent@cursor.com>
@davidberenstein1957 davidberenstein1957 force-pushed the davidberenstein1957/perf-2-hardware-cache branch from 9d1c656 to 00817d8 Compare June 22, 2026 07:49
Base automatically changed from davidberenstein1957/perf-1-lazy-init to master June 22, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant